這 大分裂 標誌著微處理器歷史上的地殼劇變。在 2001 年至 2009 年間,中央處理器(CPU)與圖形處理器(GPU)的性能發展軌跡分道揚鑣,形成巨大的能力差距。當傳統 CPU 遭遇 功耗壁壘——隨著時鐘頻率提升而產生無法承受的發熱——圖形處理器利用其龐大的消費級 使用者基礎 遊戲市場,以支持其轉向極端平行運算的策略。
關鍵轉折點
到了 2003 年,差距開始擴大。中央處理器持續專注於順序邏輯與低延遲,而圖形處理器則將其電晶體預算集中於 算術邏輯單元(ALU)。這導致圖形處理器從千兆浮點運算(GFLOPS)過渡到 太赫茲浮點運算(Teraflops) 的吞吐量,而中央處理器則維持較為平緩的成長曲線。
截至 2009 年,高階的英特爾 i7-960 提供約 70 GFLOPS,而 NVIDIA GTX 280 則達到近 933 GFLOPS。這不僅是速度的提升;更是一次根本性的計算方式重構,強調 吞吐量 而非單一指令的速度。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What primary constraint led to the 'Power Wall' for traditional CPUs?
The lack of available memory in the early 2000s.
Thermal and power limitations when increasing clock speeds.
A shortage of transistors on the silicon die.
The transition from 32-bit to 64-bit architectures.
✅ Correct!
Correct! As clock speeds increased, power consumption and heat dissipation became unmanageable for single-core serial processors.❌ Incorrect
The Power Wall refers specifically to the energy and heat limits of pushing sequential clock frequencies higher.QUESTION 2
According to the Great Divergence, which industry provided the economic engine for GPU R&D?
The Financial High-Frequency Trading market.
The Oil and Gas seismic exploration industry.
The Video Game industry.
The Cryptocurrency mining industry.
✅ Correct!
Exactly. The massive consumer installation base of gamers funded the rapid iteration of parallel graphics hardware.❌ Incorrect
While those industries use GPUs now, the video game market was the original driver of mass-market GPU evolution.QUESTION 3
By 2009, how did the peak performance of an NVIDIA GTX 280 compare to an Intel Core i7-960?
They were roughly equal in throughput.
The CPU was twice as fast as the GPU.
The GPU was nearly an order of magnitude higher (~13x).
The GPU was 100x faster than the CPU.
✅ Correct!
The GTX 280 offered ~933 GFLOPS compared to the i7's ~70 GFLOPS, representing a massive throughput delta.❌ Incorrect
Look at the 2009 snapshot: 933 GFLOPS (GPU) vs 70 GFLOPS (CPU). That's more than 10x difference.QUESTION 4
GPUs achieve higher throughput by dedicating more transistors to which component?
Large Level-3 Caches.
Complex Branch Prediction logic.
Arithmetic Logic Units (ALUs).
Instruction Decoders.
✅ Correct!
GPUs prioritize raw math execution units (ALUs) over the complex control logic used by CPUs to speed up sequential code.❌ Incorrect
CPUs use transistors for Control and Cache; GPUs use them for ALUs to perform massive parallel math.QUESTION 5
What is the correct unit for measuring one trillion floating-point operations per second?
GFLOPS.
Teraflops.
Petaflops.
Megaflops.
✅ Correct!
Tera (T) stands for trillion ($10^{12}$), while Giga (G) stands for billion ($10^{9}$).❌ Incorrect
GFLOPS are billions; Teraflops are trillions.Case Study: The 2003 Turning Point
Architectural Analysis of Figure 1.1
A hardware architect in 2003 is comparing the Pentium 4 to the GeForce FX 5800. At this moment, the performance lines on Figure 1.1 are still close together. However, within 5 years, the GPU line will accelerate exponentially while the CPU line remains linear.
Q
1. Why did the GPU trajectory become exponential while the CPU remained linear?
Solution:
GPUs are designed for data parallelism, meaning performance scales directly with the number of processing cores (ALUs) added. CPUs are limited by sequential dependencies and the complexity of managing single-thread instruction pipelines under power constraints.
GPUs are designed for data parallelism, meaning performance scales directly with the number of processing cores (ALUs) added. CPUs are limited by sequential dependencies and the complexity of managing single-thread instruction pipelines under power constraints.
Q
2. What metric on the Y-axis of Figure 1.1 defines this 'Divergence'?
Solution:
Theoretical Peak GFLOPS (Gigaflops per second), which measures the maximum possible floating-point throughput the hardware can sustain.
Theoretical Peak GFLOPS (Gigaflops per second), which measures the maximum possible floating-point throughput the hardware can sustain.